What is your corpus, why did you choose it, and what do you think is interesting about it?
Synthwave (also called outrun, retrowave, or futuresynth) is an electronic music microgenre that is based predominantly on the music associated with action, science-fiction, and horror film soundtracks of the 1980s. Other influences are drawn from the decade’s art and video games. Synthwave musicians often espouse nostalgia for 1980s culture and attempt to capture the era’s atmosphere and celebrate it. (from Wikipedia)
I chose this corpus because I am currently listening to a lot of synthwave music. I also listen to all my music on Spotify, so I can use my playlists to build my corpus up quickly. I am also currently working on a synthwave rhythm-game in unity, so exploring this corpus could also help with this side project.
I have selected the following artists, which should represent the following subgenres:
What are the natural groups or comparison points in your corpus and what is expected between them?
I intend to divide the corpus based on subgenres within synthwave. These subgenres are similar to the subgenres within heavy metal, but likely a lot more subtle. It would therefore be interesting to see if these subgenres are actually detectable within the corpus. I personally do not think there are very significant differences between most of these subgenres, but we will see if the data agrees with that statement. I want to approach this by selecting five of the most different synthwave-artists I enjoy listening to, and by looking if these differences are perceivable by the music-visualization methods mentioned during the course.
How representative are the tracks in your corpus for the groups you want to compare?
I will use a couple of playlists for each artist to build the corpus, making sure all artists take up an (close to) equal share of the corpus. This will ensure that all artists are equally represented. For subgenre detection representativeness is debatable. It is almost universally agreed upon that Home is one of the OG’s of Vaporwave. Jan Hammer is also a very respected soundtrack artist in the scene. The other genres are probably more debatable. But the same goes for the more niche subgenres of hardcore rock, so I believe the experiment will still be compelling.
Identify several tracks in your corpus that are either extremely typical or atypical. Why do you think that they are so typical or so strange?
Turbo Killer by Carpenter Brut is one atypical outlier. It is by far the most intense/speedy track in the entire corpus, even by Carpenter Brut standards. I also think that Resonance from Home may be an outlier. It has very odd sounds, even for vaporwave standards, I do not think there is anything that sounds close to this track. Respirate (Downtown Binary Remix) is the final outlier in this corpus. This is obviously because it is a remix of a song not originally created by Downtown Binary, but I think it still retains the Downtown Binary style, therefore I expect it to be interesting in analysis.
Source & Reproducibility
The whole project uses Spotify API as the source, and by hitting the Github “Source code” button, all code used to generate all visualizations can be verified on accuracy and methodology.
About the playlists
The corpus playlist contains all the tracks used in analysis. I chose to also include the old corpus, because it has a much greater listening experience. It contains a lot more variety taking the best songs from each album, while the new corpus includes more albums in their entirety.
The basics
When working with a new dataset, it is often a great idea to create some basic plots to get a feeling for the dataset, before diving into the actual research. The following two histograms will hopefully give a crude visualization of some properties of the chosen corpus.
Low popularity
Lets start of by looking at the popularity of the songs in the matrix. Spotify API assigns a popularity value to each track from 0 to 100. We can see that most songs have about 50 popularity. With some outliers close to minimum and maximum popularity. The corpus has a surprising amount of popularity while not being a well known genre (or maybe it is?). It would be nice if Spotify would explain how it determines popularity.
Low Speechness
It is clear from this histogram that speechness is very low in the corpus. This makes a lot of sense, because most tracks in the corpus do not contain any vocals. What the plot does show us is that the expected speechness values and the Spotify provided values do indeed match up, which is good.
Vibe checking
Energy and Valence can convey the mood of songs. Active songs will have high energy, while passive tracks will have low energy. In the plot we can clearly see that almost all of the tracks within the corpus have high energy. We can see clearly that almost all songs have high energy, with Carpenter Brut having the highest average energy. Jan Hammer also seems to have the highest valence tracks. The song with the lowest energy is Night Talk, which has the low energy score of 0.258.
Valence determines positivity and negativity, with high valence being positive. As opposed to energy, the corpus does contain more valence variety, but on average does have low valence. Boat party having the highest valence score of 0.96. This combination of low valence high energy would map to ‘nervous’. While this is definitely how I would describe Carpenter Brut, The Midnight and Downtown Binary do not sound nervous, so simply classifying vibe by these metrics may be naive. Carpenter Brut also has high loudness for example, which likely contributes to the nervous vibe. That being said, Jan Hammer being classified as the most positive does does concur with my own interpretation.
Classification
It is good that most songs of each artist do hover around the same valence and energy, meaning that these two features already give great information about the artists. There is unfortunately a lot of overlap between the artists. This means that this data is probably not enough on its own to guess the artist for any given song. While that is a shame, there are a lot of other features like loudness that can be used to add additional dimensions to the data, so this result still looks very promising.
Exploring loudness further
Having observed the previous plot, it looked like there may be some relationship between energy and loudness. To check if this is indeed true, here we have Loudness and energy plotted, and fitted with a line afterwards. While there are definitely outliers, we do indeed clearly see that these features do indeed exhibit a linear relation. Spotify is not very open about how it determines it’s feature values, but using this plot we do get more of an idea how energy is calculated.

Comparing original to remaster
Jan Hammer’s most iconic soundtrack has to be ‘Crockett’s Theme’ from Miami Vice. Miami Vice is quite old now (originally aired in 1984), but Jan Hammer’s work on the soundtrack is great, and went on to inspire a lot of the modern Synthwave artists. Jan Hammer recently (2018) released a remaster of ‘Crockett’s Theme’ in the ‘Special edition’ album. I like this remaster better, but it is very subtly different from the original. Therefore it is probably the perfect candidate for this comparison.
Analysis
The plot shows that the original and the remastered soundtrack are indeed very similar.

Turbo Killer Analysis
I wanted to explore the song ‘Turbo Killer’ by ‘Carpenter Brut’. This song really stands out for quick pacing and constantly building and escalating upon the previous ‘verse’. The song gives a sense of progression or fast movement/speed, and that is probably why this is one of my favorite tracks in this corpus.
When looking at the Chroma matrix we see a lot of tiny changes in the first forty seconds. Every seven-ish seconds there is a change. After 40 seconds the song changes into high tempo guitar only, and the following blocks all add additional elements to this. The matrix turns out very interesting, because you can see how the song constantly builds up to more complexity.
The Timbre matrix looks less interesting. There are no real verses, but you can tell where transitions to more complexity happen. What is very surprising is that the high point of the timbre matrix is at the end of the song.
Resonance Analysis
yes
Comparing
dissimilar
Chord Analysis
I have chosen Days of Thunder & Above All from to corpus to do chordogram analysis with. Both tracks are very consistent in their sound, which should mean both will have long stretches of the same chord. This is something that is represented in the plot, both songs have long blocks where the same chord is being played.
Both songs seem to start with more rapid chord changes, long stretches in the middle, and then more rapid chord changes in the end. What is surprising is that Days of Thunder seemingly has a single very long chord block starting at around 80 seconds and ending at 180 seconds. On closer inspection this does consist of two blocks split around 130 seconds, but when listening to the song I do hear differences within the track. This is apparently not enough to be represented with significance within the plot. At 180 seconds the chords slightly change, which is audible when listening to the song, but it is only slightly different according to the graph.
Verdict
It is odd that Above All has more chord changes according to the graph, while being much less complex when listening to the track. Chord analysis may not be the greatest way to classify tracks, that or these specific tracks did not work great with chord analysis.
Tempo Analysis
Nexus from Downtown Binary gives some interesting results in tempo analysis, it has three very distinctive phases.The intro phase takes the first 90 seconds of the song, and has very fluctuating tempo. We get a very noisy tempogram here. After this the second phase lasts until 115 seconds and has more clearly defined BPM, as it slowly bridges the intro and third phase. The third phase has a very clear 105 BPM line and a less clear 158 BPM line.
Hang’em all is an interesting song to compare with Nexus, Because Nexus is more chill, while Carpenter Brut’s Hang’em all is more powerful/aggressive. Hang’em contains both high and low intensity sections. The first 50 seconds of the track start intense, followed by a low intensity subsection. You can clearly observe this in the Tempo plot, which becomes less noisy and a more consistent line after these first 50 seconds. From 85 to 120 seconds there is a more intense subsection, followed by low intensity that blends into high, which is unexpected, because the whole section looks like the first 50 seconds of the low intensity track section.
Comparing
Hang’em all consistently has higher BPM compared to Nexus, which was to be expected. While Hang’em all does contain relaxed parts, it always feels more faster paced than Nexus. The lowest intensity/buildup phases of both seem to have BPM’s that are not clearly defined by the plot, but when the main parts of both songs happen the BPM line becomes very sharp and accurate. Average BPM seems to be a reasonable and usable statistic to differentiate between Carpenter Brut and Downtown Binary.
K-Nearest Neighbors
After all previous analysis, it is finally time to see if the research-question indeed holds true. This will be tested Using K-nearest neighbors. This algorithm makes use of all the Spotify API data we have gotten accustomed to in previous analysis, and uses it to predict the artist. To validate these results, we do 5-K fold validation. This means that we train the model on 80% of the data and validate on the remaining 20%. This is then repeated until all 20% slices of the corpus have been in the validation partition once, giving a more honest non-cherry-picked representation of the results.
Results
When interpreting the results, one should keep in mind that the maximum score is 20, since the corpus has 20 entries of each artist. It is obvious that the KNN classifier is very accurate when predicting artist. It achieves the best results with Carpenter Brut, which is likely due to the fact that it contrasts the most compared to the other artists. It also performs decent on Jan Hammer and The Midnight. Downtown Binary and Home seem to be more difficult.